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To comprehend the hierarchical organization of large integrated systems, we introduce the hierar- 
chical map equation, which reveals multilevel structures in networks. In this information-theoretic 
approach, we exploit the duality between compression and pattern detection; by compressing a 
description of a random walker as a proxy for real flow on a network, we find regularities in the 
network that induce this system-wide flow. Finding the shortest multilevel description of the 
random walker therefore gives us the best hierarchical clustering of the network — the optimal 
number of levels and modular partition at each level — with respect to the dynamics on the 
network. With a novel search algorithm, we extract and illustrate the rich multilevel organization 
of several large social and biological networks. For example, from the global air traffic network 
we uncover countries and continents, and from the pattern of scientific communication we reveal 
more than 100 scientific fields organized in four major disciplines: life sciences, physical sciences, 
ecology and earth sciences, and social sciences. In general, we find shallow hierarchical structures 
in globally interconnected systems, such as neural networks, and rich multilevel organizations in 
systems with highly separated regions, such as road networks. 



Introduction 

Ever since Aristotle, organization and classification 
have been cornerstones of science. In network sci- 
ence (1, 2), categorization of nodes into modules with 
community-detection algorithms has proven indispens- 
able to comprehending the structure of large integrated 
systems (3-5). But in real- world networks, the organi- 
zation rarely is limited to two levels, and modular de- 
scriptions can only provide cross sections of much richer 
structures. For example, both biological and social sys- 
tems are often characterized by hierarchical organization 
with submodules in modules over multiple scales (6-10). 

Several network clustering algorithms generate hierar- 
chical trees, but few make more than a single cut through 
the dendrogram. To extract multiple levels of the net- 
work structure (9-12), the common approach is to first 
generate a dendrogram or group nodes with one method 
and then determine the multiple cuts or the resolution 
thresholds with a different method. Moreover, these 
methods approach the problem of community detection 
by inferring a model of an underlying generative process 
that created the network. That is, they view the real 
network structure as a realization of a probabilistic pro- 
cess that creates links between groups of nodes and try to 
identify the most likely underlying grouping. While this 
may be the appropriate strategy when one is fundamen- 
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tally interested in the modular nature of the dynamics 
by which a given network was formed, it may not be op- 
timal when one is more interested in understanding the 
subsequent dynamics or behavior that occur on the real 
network (13). 

In many real-world networks, directed and weighted 
links represent the constraints that the structure of a 
network places on dynamical processes taking place on 
this network. Networks thus often represent literal or 
metaphorical flows: people surfing the web, passengers 
traveling between airports, ideas spreading between sci- 
entists, funds passing between banks, and so on. This 
flow through a system makes its components interdepen- 
dent to varying extents. The objective of our hierarchical 
clustering approach, therefore, is to reveal the multiple 
levels of interdependences between the nodes of a net- 
work with a single method. That is, a method that does 
not require multiple external resolution parameters, but 
rather inherently reveals the natural multiple levels of 
the system. 

In this paper, we generalize the flow-based and infor- 
mation theoretic clustering method called the map equa- 
tion (14, 15) to uncover important multilevel structures 
and their relationships in networks. This generalization 
yields the hierarchical map equation, which provides a 
natural answer to three questions: Into how many hier- 
archical levels is a given network organized? How many 
modules are present at each level? And which nodes 
are members of which modules? Here we focus on hard 
partitions and flow of random walkers; we postpone the 
natural extension of this approach to overlapping parti- 
tions and generalized flows to a subsequent paper. We 



2 



begin by briefly reviewing the map equation, and then 
introduce the hierarchical map equation, of which our 
earlier two-level map equation (14, 15) can be seen as 
a special case. We then illustrate the mechanics of the 
hierarchical map equation, and extract and depict the 
hierarchical structure of several large-scale networks. Fi- 
nally, in the Materials and Methods section, we provide a 
detailed description and a performance test of our novel 
recursive search algorithm. 



Results and Discussion 
The two-level map equation 

We have recently introduced the map equation to sim- 
plify and highlight important structures with respect to 
the dynamics on networks. This approach uses a random 
walk as a proxy for the real flow (14, 15), and exploits 
the duality between compressing a message and finding 
patterns in the structure that generates that message 
(16, 17). To find the regularities that induce the dy- 
namics on networks, the map equation measures, for a 
given network partition, the per-step average description 
length of a random walker moving along the (weighted 
and directed) links between the nodes of a network. By 
minimizing the map equation over all possible network 
partitions, we can reveal the structures that generate the 
flow on the network. 

The map equation is designed to capitalize on the mod- 
ular structure of a network; the description length of the 
dynamics on the network can be compressed if the net- 
work has localized regions in which small groups of nodes 
have long persistence times. Compression is achieved by 
using multiple module codebooks with reused short code- 
words for different nodes in the network. To make the 
compressed description unambiguous, an index codebook 
distinguishes which module codebook is active. Specifi- 
cally, for a module partition M of n nodes a = 1,2, ... , n 
into m modules i = 1, 2, . . . , m, the lower bound on the 
code length L(M) is the sum of the average length of 
codewords for each codebook weighted by the rate of use 
of each codebook. Shannon's source coding theorem (18) 
states that, when we use n codewords to describe the 
n states of a random variable X that occur with fre- 
quencies pi, the average length of a codeword can be no 
less than the entropy of the random variable X itself: 
H(X) = — J2i Pi logfe) ( we measure code lengths in 
bits and take the logarithm in base 2). This gives us the 
map equation: 

m 

L(U)= q ^H(Q) + Y,PhH{V l ). (1) 

H(Q) is the frequency- weighted average length of code- 
words in the index codebook, and H(V r ) is the frequency- 
weighted average length of codewords in module code- 
book i. Further, the entropy terms are weighted by 



the rate at which the codebooks are used. With g irv 
for the probability of exiting (and entering) module i, 
the index codebook is used at a rate q n — YnLi Qir^i 
which is the probability that the random walker switches 
modules on any given step. With p a for the probabil- 
ity of visiting node a, module codebook i is used at 
a rate p 1 ^ = ^2 a£i p a + ftrvi the fraction of time the 
random walker spends in module i plus the probabil- 
ity that she exits the module and the exit message is 
used. We have provided an interactive and dynamic vi- 
sualization of the mechanics of the map equation here: 
www .mapequation. org. 

Figure 1A illustrates the partitioning obtained by using 
the two-level map equation. The 27-node example net- 
work is partitioned into nine modules, and the descrip- 
tion length is theoretically 3.57 bits. For comparison, 
a single-module description of the network (one module 
codebook and no index codebook) has a lower bound of 
4.75 bits. 

When driven by a strong search algorithm, the map 
equation provides an efficient tool for revealing the mod- 
ular structure of networks (19). But many networks have 
important structures at multiple scales (3) , and the code 
structure of the two-level map equation cannot capital- 
ize on these. For example, the network in Fig. 1A is hi- 
erarchically organized with submodules within modules, 
but the two-level map equation cannot simultaneously 
capitalize on both the module and submodule levels of 
structure. It minimizes code length by partitioning at 
the submodule level, revealing nine modules as shown 
in Fig. 1A. Additional potential for compression from 
the module level structure goes untapped, and thus ad- 
ditional structure at the module level goes unreported. 



The hierarchical map equation 

To reveal pattern at multiple levels, we must generalize 
the coding structure upon which the two-level map equa- 
tion is based. Figure IB shows a hierarchical description 
of the network with not one but two index codebooks, 
one for each level of hierarchy. With this code structure, 
the description length can be reduced from the 3.57 bits 
required by the two-level map equation to 3.48 bits, be- 
cause the average description length to determine which 
of the nine module codebooks is active has been reduced 
by 0.09 bits per step. The extra codebook makes it pos- 
sible to exploit the fact that the fine-level modules are 
themselves organized into larger modules: once a ran- 
dom walker enters one of the three larger modules, she 
tends to stay there for a long time. 

Broadly, in the hierarchical map equation we release 
the constraint of a single index codebook and allow for 
an arbitrary number of hierarchically nested index code- 
books that specify movements between modules, sub- 
modules, subsubmodules, and so on, down to the finest 
modular level. Formally, for a hierarchical map M of n 
nodes partitioned into m modules, for which each module 
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FIG. 1 Minimizing the map equation over all network partitions gives an optimal clustering of the network with 
respect to the dynamics on the network. Optimal two-level clustering is shown in A and hierarchical clustering is shown 
in B. The description length, which is 4.75 bits for an unpartitioned network, is the sum of the average length of codewords 
from the index codebook(s) and the module codebooks weighted by the rate of use of each codebook. For this undirected 
unweighted network with total degree 78, all rates can be calculated by counting links and normalizing: The codewords of the 
index codebook in A are used at relative rates Q = ^^i^i^i^iJji^j^!^ at a total rate = || and, for example, the 
codewords of the first module codebook are used at relative rates V 1 = ^ u - ,,„, ,, - _ s 

from the exit probability q^. — The codewords of the smaller index codebooks in B are used at relative rates Q — 
and Q 1 — ^, ^, ^, ^ at total rates q^ = ^ and q 1 ^ = || 
the modules of the two-level clustering. 
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The fine-level modules of this hierarchical clustering coincide with 



i has a submap M* with m l submodules, for which each 
submodule ij has a submap M* J with m lJ submodules, 
and so on, the hierarchical map equation takes the form 

m 

L(M)=q^H(Q) + J2L(M t ), (2) 

i=l 

with the description length of submap M 1 at intermediate 
levels given by 

L(M l H^ J ff(Q*) + ^L(M*-') (3) 

and at the finest modular level by 

L(M ij ~ k ) =^- k H[V ij - k ). (4) 



At each submodule level, is the rate of codeword use 
for entering the m; submodules or exiting to a coarser 
level and H(Q i ) is the frequency- weighted average length 
of the codewords in the subindex codebook. At the finest 
level, p^'" k is the rate of codeword use for visiting nodes 
in submodules ij . . . k or exiting to a coarser level and 
-ff(-py---fc) is the frequency weighted average length of 
the codewords in the submodule codebook. To find the 
hierarchical structure that best represents the structure 
with respect to flow, we seek the hierarchical partition of 
the network that minimizes the hierarchical map equa- 
tion over all possible hierarchical partitions of the net- 
work (see Materials and Methods for a detailed descrip- 
tion and a performance test of the algorithm). Figure IB 
illustrates the optimal hierarchical partition and the cor- 
responding code structure for the example network. 



A Science B Global air traffic C Human diseases 





FIG. 2 Multilevel organization in three real-world networks. The bottom row illustrates structures that a two-level 
clustering can capture. The width of the horizontal lines represents the size of the modules and the number to the left of the 
braces gives the number of submodules within each module. For visual simplicity, we exclude submodules with less than 1 per 
mil of all flow. See Fig. 3 for a hierarchical map of science based on the journal citation network. 



Multilevel organization in real-world networks 

The hierarchical map equation can reveal rich multi- 
level organization in real-world networks. Figures 2A-C 
provide thumbnail illustrations of the hierarchical struc- 
ture of the journal citation network of science (20), the 
global air traffic network (21), and the human disease 
network (22). For comparison, Figures 2D-F show the 
structure of each network as characterized by the two- 
level map equation. 

The journal citation network traces more than nine 
million citations among nearly 8,000 journals in the sci- 
ences and social sciences. From the pattern of citations, 
we reveal more than 100 scientific fields organized in four 
major disciplines: life sciences, physical sciences, ecology 
and earth sciences, and social sciences. The physical sci- 
ences are in turn organized into physics and chemistry, 
with 35 subfields, and mathematics, with 24 subfields 
(see Fig. 3). 

In the global air traffic network, two cities are consid- 
ered connected if a regularly scheduled commercial pas- 
senger flight travels between them. From the network of 



3,883 cities connected by 14,142 links, the algorithm un- 
covers an overall organization of cities grouped in coun- 
tries and countries grouped in continents. For example, 
the largest module comprises European and African cities 
arranged into 55 submodules; the second largest module 
comprises North and South American cities organized 
into 75 submodules. These submodules represent the 
Eastern US cities, the Western US cities, Mexican cities, 
and so on. 

For the familiar networks of science and global air traf- 
fic, the organization revealed by the hierarchical map 
equation is intuitive and anticipated. But for the hu- 
man disease network that connects diseases if they share 
common genes (22), the outcome is quite different. In 
the hierarchical partition of this network, the submod- 
ules contain class-related diseases, but only the largest 
module, which groups different cancers together, is com- 
patible with any natural classification of diseases. We 
interpret this as an effect of missing data and a bias to- 
ward studies on oncogenes and other genes associated 
with cancer. 

Beyond these three examples, many real-world net- 
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FIG. 3 A hierarchical map of science. We partitioned 7,940 journals connected by 9.2 million citations (20) into four 
major disciplines, which we identified as life sciences, physical sciences, ecology and earth sciences, and social sciences. In 
physical sciences, we followed a second-level split into the areas of mathematics and of physics and chemistry. The size of the 
modules represents the fraction of time that a random surfer spends following citations in that field, and the arrows indicate 
flow volume between the fields. For visual simplicity, we exclude fields and arrows with low flow. 



works have rich hierarchical structures. To illustrate, 
we have used the generalized map equation to partition 
twelve networks, ranging in size from hundreds to mil- 
lions of nodes. In Table I, these networks are listed in 
descending order according to the magnitude of the com- 
pression gained by using a multilevel partitioning instead 
of a two-level partitioning. In general, we find shallow hi- 
erarchical structures in globally interconnected systems 
and rich multilevel organizations in systems with highly 
separated regions. 

The network with the highest compression gain — i.e., 
the network with the greatest degree of nested hierar- 
chical structure — is the California road network (23). 
The geographical constraints of the road network prevent 
shortcuts between different and remote parts of the net- 
work. As a result, the organization is distinct down to the 
very many small bottom modules. The web graphs have 
the next greatest compression gain. They are as deep 
as the road network, but without physical constraints, 
different parts of the web are presumably more inter- 
connected. The lowest-level arc on average larger, and 
the flow between different large-scale regions reduces the 
compression gain. 

In the other extreme in Table I are the C. Elegans 
brain network (27) and the weighted and directed net- 
work of US air travel passengers (26), which were best 
compressed by two-level descriptions. The many links be- 



TABLE I The hierarchical organization of real-world 
networks. For each multi-level classification of a network 
with n nodes and I links, we report the total number of mod- 
ules m together with the number of modules with more than 
one percent of all nodes, the per- node average depth (d), the 
per-node average size of the lowest-level module (st), and the 
compression gain over a two-level clustering AC. The f 6 net- 
works are ordered by the compression gain, which provides 
information about how hierarchical the organization is. 
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tween different regions at a global scale of these networks 
maintain high connectivity and short distances, and pre- 
vent further gain from a multilevel description. For the 
same reason, the dual road network of Stockholm (25), 
with roads as nodes and intersections as edges, has a less 
pronounced multilevel structure than the road network of 
California (23), with intersections as nodes and roads as 
links, and the different representations overshadow dif- 
ferences in the actual road layouts. For example, a main 
road that intersects with many streets in several suburbs 
forms a hub that connects suburban streets in the dual 
representation. Therefore, the gain from a deep multi- 
level description is lost in the dual representation, which 
suppresses distances and makes the network more inter- 
connected. When comparing the hierarchical depth be- 
tween the road network of California and the dual road 
network of Stockholm, the range of the networks also 
plays an important role. Both networks represent streets 
in neighborhoods in suburbs, but the road network of 
California also includes the additional level of multiple 
cities. In this way, and because the number of nodes 
in a network quickly grows for every additional level of 
nested modules, there is a general trend that the hierar- 
chical depth increases with network size in Table I. 

Figure 2 and Table I summarize the extent of hierar- 
chical structure found in several large networks, but they 
provide no information about the relationships among 
the modules at any given level. To comprehend the dy- 
namics of a system, we must capture both its hierarchical 
structure and the connections among modules at all levels 
of structure. Because the hierarchical map equation nat- 
urally balances the persistence times in modules and the 
flow between modules when it exploits the regularities in 
patterns of movement on a network, both are intrinsic 
to our approach. In Fig. 3, we illustrate the relation- 
ships among modules in a hierarchical map of science. 
The multilevel map highlights and simplifies the citation 
flow between the major disciplines. At the same time, it 
summarizes the flows between fields that integrate those 
fields into larger disciplinary areas; for example, the ar- 
rows indicate the flows among the fields composing the 
social sciences. If a researcher would make a random 
walk in the scholarly literature by reading a paper and 
following a random citation to a new paper, she would 
spend 54 percent of her time reading journals in the life 
sciences, 33 percent in the physical sciences, 8 percent 
in the ecology and earth sciences, and 4 percent in the 
social sciences. The disciplines are well defined with long 
persistence times; only around one percent of the time 
would she follow a citation across discipline boundaries, 
the traversal from the physical sciences to the life sciences 
being the most common of these. 

Using the fundamental mathematics of information 
theory to exploit the duality between compression and 
pattern detection, we have shown how to reveal the mul- 
tilevel organization of networks. Combined with pow- 
erful visualizations, the hierarchical map equation pro- 
vides a useful tool to comprehend the hierarchical orga- 



nization of large multiscale social and biological systems. 
Here we have focused on hard partitions and the flow 
of random walkers, but in a subsequent paper we will 
demonstrate the natural extension of the map equation 
to overlapping partitions and generalized flows. In short, 
we can capitalize on overlapping structures by modifying 
the code structure and releasing the constraint that a 
node can only belong to one module codebook. Because 
the codelength only depends on the rates of node vis- 
its and module transitions, the map equation framework 
is agnostic to the origin of the flow. Therefore, we can 
comprehend the organization in real systems for which a 
random walker is not a good proxy for flow through the 
system, by using a different model of flow or by directly 
measuring the real flow. 



Materials and Methods 

Here we provide a detailed description of the mathe- 
matics of the hierarchical map equation and outline the 
stochastic and recursive algorithm we have developed to 
search for the hierarchical partition of a network that 
minimizes the hierarchical map equation. We also de- 
scribe how we quantify the performance of our method 
with the relative mutual information of module and sub- 
module assignments between the benchmark networks 
and the hierarchical clustering generated by the algo- 
rithm. 



The hierarchical map equation 

The hierarchical partitioning algorithm builds on the 
fast stochastic search algorithm presented in ref. (15), 
with two major differences. First, to explore multi- 
level solutions, the algorithm recursively tries to add 
extra index codebooks both at coarser and finer levels. 
Sometimes movements between modules can be further 
compressed by adding one or more coarser index code- 
books and sometimes movements within modules can 
be further compressed by adding one or more finer in- 
dex codebooks. In its search for the optimal hierarchi- 
cal partitioning, the algorithm successively increases and 
decreases the depth of different branches of the multi- 
level code structure. Second, to reduce the small co- 
hesive effect of random teleportation, the map equation 
only measures the description length of steps following 
links and not the steps associated with random tele- 
portation. In this way, the resolution increases slightly 
and the algorithm can better detect less-separated mod- 
ules or submodules. The code is available here: http: 
//www. tp.umu.se/~rosvall/code. html. Below we ex- 
plain how we have implemented these differences. 

To exclude random teleportation steps from the de- 
scription length of directed networks, we first calculate 
the ergodic node visit frequencies p a for a = l,...,n 
with random teleportation at rate r = 0.15 as before. 
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Then, for every node a and for all its outgoing links with 
relative weight w a p to node /3, we calculate the probabil- 
ity that the random surfer does not tclcport but rather 
follows a link in a given step: 

qar^fi = (1 - T)p a W a/3 . (5) 

Note that the in- and outflow no longer need to be equal, 
as in the crgodic case. Finally, we update the node visit 
frequencies to exclude the contribution from random tele- 
portation: 

a 

Pa=^2qp^a- (6) 



For a given hierarchical network partition, the hierar- 
chical map equation measures the per-step average min- 
imal information necessary to track a random walker's 
movements along links on a network. Sometimes the 
random walker stays within the same finest-level sub- 
module, and sometimes she moves up and down one 
or more levels in the hierarchy. At the coarsest level, 
the description length measures the information neces- 
sary to determine which coarsest-level module the ran- 
dom walker enters, weighted by how often such move- 
ments happen. The relative rate of codeword use is 

Q = UU/q^} = q^/q^,q^/q^, ■ ■ -,q™/q^, where 

m 

= (7) 

1=1 

is the per-step average flow into the modules and the total 
codeword use at the coarsest level. The Shannon infor- 
mation of movements at the coarsest level — weighted 
by the total use — is therefore 

q^H(Q) = q ^(-J2 — ^£—)- (8) 

V i= i q ^ q ^J 

At intermediate levels, to measure the contribution to 
the total codelength in submodule i, it is sufficient to 
aggregate the flow associated with movements to coarser 
levels q\ and flow that is associated with movements into 
the m l finer levels of the hierarchy {q 1 ^}- The relative 

rate of codeword use is Q l = q l n /q}j, qlX/q l , ■ ■ • , q^ /q}j, 
where 

&=9~ + E# ( 9 ) 

is the total codeword use. The Shannon information of 
movements in this submodule, weighted by how often the 
code is used, is therefore 

qi i H{Q i )=qi i ( log % - £ ^ log ^ I . (10) 



At the finest levels, nodes rather than submodules are 
visited and the relative rate of codeword use is 'P l i--- k = 
Q%" k /P%" k , {Pae l 3...k/p^- k }, where 

P%- k = c?t k + E (") 

a£ij...k 

is the total codeword use. The Shannon information of 
movements at the finest level weighted by the total use 
of the code therefore is 

p%- k H{V ij - k ) = 

ij...k I qik" k lo q l X" k p a lo p a \ 

\ PO PO aeij-kPO PO J 

(12) 

Adding the contribution from every module at all lev- 
els gives the total description length, which is quantified 
by the hierarchical map equation. For a hierarchical map 
M of n nodes partitioned into m modules, for which each 
module i has a submap M l with m l submodules, for which 
each submodule ij has a submap M lJ with m 11 submod- 
ules, and so on, the hierarchical map equation takes the 
form 

m 

L(M)=q^H(Q)+Y / L(M r ), (13) 

i=l 

with the description length of submap M 1 at intermediate 
levels given by 

L(W) = qbH(Q-) + J2HM^) (14) 

i=i 

and at the finest modular level by 

L(M ij - k ) = p%- k H(V ij - k ). (15) 

Fast stochastic and recursive search algorithm 

The hierarchical map equation measures the per-step 
average code length necessary to describe a random 
walker's link movements on a network, given a hierar- 
chical network partition, but the challenge is to find the 
partition that minimizes the description length. Into how 
many hierarchical levels should a given network be parti- 
tioned? How many modules should each level have? And 
which nodes should be members of which modules? 

We have generalized our search algorithm for the two- 
level map equation to recursively search for multilevel 
solutions. The recursive search operates on a module at 
any level; this can be all the nodes in the entire network, 
or a few nodes at the finest level. For a given module, 
the algorithm first generates submodules if this gives a 
shorter description length. If not, the recursive search 
does not go further down this branch. But if adding 



submodules gives a shorter description length, the al- 
gorithm tests if movements within the module can be 
further compressed by additional index codcbooks. Fur- 
ther compression can be achieved both by adding one 
or more coarser codebooks to compress movements be- 
tween submodules or by adding one or more finer index 
codebooks to compress movements within submodules. 
To test for all combinations, the algorithm calls itself re- 
cursively, both operating on the network formed by the 
submodules and on the networks formed by the nodes 
within every submodule. In this way, the algorithm suc- 
cessively increases and decreases the depth of different 
branches of the multilevel code structure in its search for 
the optimal hierarchical partitioning. For every split of 
a module into submodules, we use the search algorithm 
detailed in ref. (15) and described again here. 

Any greedy (fast but inaccurate) or Monte Carlo-based 
(accurate but slow) approach can be used to minimize 
the map equation. To provide a good balance between 
the two extremes, we developed a fast stochastic and re- 
cursive search algorithm, implemented it in C++, and 
made it available online both for directed and undirected 
weighted networks (28). As a reference, the new algo- 
rithm is as fast as the previous high-speed algorithms 
(the greedy search presented in the supporting appendix 
of ref. (14)), which were based on the method introduced 
in ref. (29) and refined in ref. (30). At the same time, 
it is also more accurate than our previous high-accuracy 
algorithm (a simulated annealing approach) presented in 
the same supporting appendix. 

The core of the algorithm follows closely the method 
presented in ref. (31): neighboring nodes are joined into 
modules, which subsequently are joined into supermod- 
ules, and so on. First, each node is assigned to its own 
module. Then, in random sequential order, each node 
is moved to the neighboring module that results in the 
largest decrease of the map equation. If no move results 
in a decrease of the map equation, the node stays in its 
original module. This procedure is repeated, each time in 
a new random sequential order, until no move generates a 
decrease of the map equation. Now the network is rebuilt, 
with the modules of the last level forming the nodes at 
this level, and, exactly as at the previous level, the nodes 
are joined into modules. This hierarchical rebuilding of 
the network is repeated until the map equation cannot be 
reduced further. Except for the random sequence order, 
this is the algorithm described in ref. (31). 

With this algorithm, a fairly good clustering of the 
network can be found in a very short time. Let us call 
this the core algorithm and see how it can be improved. 
The nodes assigned to the same module are forced to 
move jointly when the network is rebuilt. As a result, 
what was an optimal move early in the algorithm might 
have the opposite effect later in the algorithm. Because 
two or more modules that merge together and form one 
single module when the network is rebuilt can never be 
separated again in this algorithm, the accuracy can be 
improved by breaking the modules of the final state of 



the core algorithm in either of the two following ways: 

Submodule movements. First, each cluster is 
treated as a network on its own and the main al- 
gorithm is applied to this network. This procedure 
generates one or more submodules for each mod- 
ule. Then all submodules are moved back to their 
respective modules of the previous step. At this 
stage, with the same partition as in the previous 
step but with each submodule being freely mov- 
able between the modules, the main algorithm is 
re-applied. 

Single-node movements. First, each node is re- 
assigned to be the sole member of its own mod- 
ule, in order to allow for single-node movements. 
Then all nodes are moved back to their respective 
modules of the previous step. At this stage, with 
the same partition as in the previous step but with 
each single node being freely movable between the 
modules, the main algorithm is re-applied. 

In practice, we repeat the two extensions to the core 
algorithm in sequence and as long as the clustering is 
improved. Moreover, we apply the submodule move- 
ments recursively. That is, to find the submodules to 
be moved, the algorithm first splits the submodules into 
subsubmodules, subsubsubmodules, and so on until no 
further splits are possible. Finally, because the algorithm 
is stochastic and fast, we can restart the algorithm from 
scratch every time the clustering cannot be improved fur- 
ther and the algorithm stops. The implementation is 
straightforward and, by repeating the search more than 
once, 100 times or more if possible, the final partition is 
less likely to correspond to a local minimum. For each it- 
eration, we record the clustering if the description length 
is shorter than the previous shortest description length. 
In practice, for networks with on the order of 10,000 
nodes and 1,000,000 directed and weighted links, each 
iteration takes a few seconds on a modern laptop. 

Performance test of the hierarchical map equation 

To test the performance of our algorithm, we used 
the benchmark paradigm developed by Lancichinctti 
and Fortunato (19). They have provided an exten- 
sion of their algorithm to generate benchmark net- 
works with an extra submodular level and made 
it available here: http://sites.google.com/site/ 
santof ortunato/inthepress2. But before detailing the 
performance test, we follow the reasoning in ref. (19) 
and provide an approximate relationship between a well- 
defined hierarchical structure and the coarse- and fine- 
level mixing parameters. 

From a topological point of view, a three-level hierar- 
chical structure is well defined if 

P3 >P2 > Pi, (16) 
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where p 3 is the probability that a random link connects 
two nodes in the same fine-level module, P2 is the prob- 
ability that it connects two nodes in different fine-level 
modules but the same coarse-level module, and p\ is the 
probability that it connects two nodes in different coarse- 
level modules. We can estimate these probabilities, given 
the expected number of links a node i shares with nodes 
within the same fine-level module kf , with nodes within 
the same coarse-level module but different fine-level mod- 
ules kf, and with nodes in other coarse-level modules 
k\, We do this by approximating the number of avail- 
able links within the same module to n 3 (k), where 713 
is the number of nodes in the fine-level module and (k) 
is the average degree of nodes in the network. The cor- 
responding approximation for within-coarse-level mod- 
ules is (712 — n 3 ){k), where 712 is the number of nodes in 
the coarse-level module. The approximation for available 
links in other coarse-level modules is [n\ — n%){k), where 
ni is the number of nodes in the full network. Now we 
have 



P3 
P2 
Pi 



n 3 (k) 



fc? 



(n 2 - n 3 ){k) 
(ni - n 2 )(k) 



(17) 
(18) 
(19) 



The mixing parameters fii and /x 2 are defined as follows: 



kf 



P2 - Mi 



M2 



kf 



A 2 

kf 



kf 



A 2 

k} 



Pi 



kf + kf + k] 



(20) 
(21) 
(22) 



such that nodes share on average a fraction [i\ of their 
links with nodes in other modules, a fraction [12 of their 
links with nodes in other submodules, and the remain- 
ing fraction 1 — pL\ — ji2 of their links with nodes in the 
same submodule. Now we have the information to deter- 
mine where the full hierarchical structure is well defined. 
Combining eqs. (16-22) yields the relationship 



1 



P2 - Pi 



> 



P2 



> 



Pi 



n 3 



n 2 - n 3 



n\ - n- 2 



(23) 



The two inequalities correspond to two lines in the [i\-\i2 
plane, determined by the extreme values of n 3 , n 2 , and 
ni. For a well-defined three-level hierarchical structure, 
\x 2 must be larger than 



"2t ~ n H 
ni - n 2 t 



-Pi 



and smaller than 



"2; - ngt 
ri2i 



(1-Mi)- 



(24) 



(25) 



Here n 3 ^ is the smallest number and the largest num- 
ber of nodes a fine- level module can have, with the same 
notation for the coarse-level modules. Figure 4 shows the 
range of mixing parameters that correspond to a well- 
defined three-level hierarchical structure, for the values 
we have used in the benchmark test. 




Coarse level mixing pi 

FIG. 4 The range of mixing parameters that give 
a well-defined three-level hierarchical structure for 
the benchmark networks in the paper. The networks 
have ni = 10, 000 nodes, coarse-level module sizes between 
"■24. = 400 and ri2t = 4, 000 nodes, and fine-level module 
sizes between n^i = 10 and nz^ = 100 nodes. The connected 
points illustrate the sets of mixing parameters we present in 
the paper. 



To quantify the performance of our method, we use 
the relative mutual information (32) and measure how 
much we learn about the true benchmark partitions by 
studying the inferred partitions that we get by applying 
the hierarchical map equation. We independently com- 
pare the coarse and fine levels of the benchmark networks 
with the multilevel partitioning inferred by the map equa- 
tion. That is, we compare the first-level modules of the 
benchmark networks with the first-level modules of the 
inferred modules and the second-level submodules of the 
benchmark networks with the finest-level submodules of 
the inferred modules. Note that with this approach, the 
finest-level submodules do not need to be at the second 
level in the inferred structure. Therefore, we also mea- 
sured the per-node average depth of the hierarchy to pick 
up information about how many levels were detected. 

To calculate the relative mutual information, we label 
every node by its module number. In this way, pick- 
ing a random node and reading off its module num- 
ber corresponds to sampling from the discrete ran- 
dom variable X with probability distribution P(X) = 
ni/n, n^/n, . . . ,n m /n, where n is the number of nodes, 
n x is the number of nodes in module x, and m is the 
number of modules. The average information necessary 
to describe the random variable, the Shannon informa- 
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tion of X, is accordingly 



T^X ^ ^X 

— log — 

n n 



(26) 



With X for the benchmark partition, Y for the algo- 
rithm partition, and n xy for the number of nodes that 
are jointly partitioned in module x and module y, the 
mutual information is 



I(X;Y) = -J2 



n n 



■ni 



— y -\og . 



(27) 



Finally, the normalized mutual information (32) with 
a range between for independent partitions and 1 for 
identical partitions is 



R(X;Y) 



2I(X;Y) 
H(X)+H(Y)' 



(28) 



We used scale-free networks (exponent -2) with 10,000 
nodes, average degree 20, and maximum degree 100, and 
let the module sizes vary between 400 and 4,000 nodes 
and the submodule sizes between 10 and 100 nodes, both 
with a scale-free size distribution (exponent -1). Figure 5 
shows the result of the benchmark test. The performance 
is excellent as long as the hierarchical organization is well 
defined and nodes have strictly more links within than 
between fine-level modules and more links within than 
between coarse-level modules; otherwise, the well-defined 
range is too narrow. Because of fluctuations in the bench- 
mark networks, the levels interweave close to the limits 
of well-defined modules and the algorithm can only ex- 
tract the fine-level modules. Overall, the results are on 
par with what we have obtained for two-level benchmark 
networks (19). 
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